Skip to content

refactor(ntx-builder): simplify coordinator-actor messaging with Notify#1699

Open
SantiagoPittella wants to merge 9 commits intonextfrom
santiagopittella-ntx-simplify-actor-messages
Open

refactor(ntx-builder): simplify coordinator-actor messaging with Notify#1699
SantiagoPittella wants to merge 9 commits intonextfrom
santiagopittella-ntx-simplify-actor-messages

Conversation

@SantiagoPittella
Copy link
Collaborator

First task of #1694

  • Replace per-actor mpsc<Arc<MempoolEvent>> channels with Arc<Notify>. The DB is already the source of truth (since chore(ntx): replace in memory with sqlite database #1662), so actors only need a "re-check your state" signal and not the event payload. This removes all SendError handling, failed-actor cleanup, and the send() helper from the coordinator.
  • In TransactionInflight mode, actors now query the DB (is_transaction_resolved) to check if their awaited transaction was committed/reverted, since Notify doesn't carry payload.
  • Unify actor -> coordinator communication into a single ActorRequest enum over one mpsc channel. NotesFailed carries a oneshot ack to prevent a race condition where the actor could re-select notes before failure counts are persisted (context). CacheNoteScript remains fire-and-forget.

@SantiagoPittella SantiagoPittella added the no changelog This PR does not require an entry in the `CHANGELOG.md` file label Feb 23, 2026
let resolved = self.db
.is_transaction_resolved(account_id, awaited_id)
.await
.expect("should be able to check tx status");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to stop everything if a single actor fails on this basis?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an actor panics, the coordinator catches the error in coordinator.next() and logs it, but the system keeps running without the actor. Though know that you mentioned this, probably the best would be to return an specific error for this Db fail case.

/// before the failure counts are updated in the database.
async fn mark_notes_failed(&self, nullifiers: &[Nullifier], block_num: BlockNumber) {
let (ack_tx, ack_rx) = tokio::sync::oneshot::channel();
let _ = self
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ is type Result<(), SendError<ActorRequest>>. Are we sure we don't want to be handling that error here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with let _ = ack_rx.await; below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced them with .is_err checks

Copy link
Collaborator

@sergerad sergerad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just a few questions on errors we don't handle

Copy link
Collaborator

@igamigo igamigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving some comments. Would be good to load test this even if lightly to see that it performs well

) -> Result<T, ActorShutdownReason> {
result.map_err(|err| {
tracing::error!(err = err.as_report(), account_id = %account_id, "{context}");
ActorShutdownReason::DbError(account_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this error variant include the miden_node_db::DatabaseError ?

use crate::store::StoreClient;

// ACTOR NOTIFICATION
/// Converts a database result into an `ActorShutdownReason` error, logging the error on failure.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs are a bit misleading (it's an internal fn so feel free to disregard):

Suggested change
/// Converts a database result into an `ActorShutdownReason` error, logging the error on failure.
/// Converts a database result mapping the error into an `ActorShutdownReason`, logging it on failure.

Comment on lines +252 to +256
let has_notes = match db_query(
account_id,
self.db.has_available_notes(account_id, block_num, self.max_note_attempts).await,
"failed to check for available notes",
) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit, feel free to disregard: A pattern similar to

self.db.has_available_notes(...).map_err(|err| map_db_err(acc_id, "failed to check notes"))

reads a bit better IMO

Comment on lines +296 to +301
match request {
ActorRequest::NotesFailed { nullifiers, block_num, ack_tx } => {
if let Err(err) = self.db.notes_failed(nullifiers, block_num).await {
tracing::error!(err = %err, "failed to mark notes as failed");
}
let _ = ack_tx.send(());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be acked even if the db.notes_failed() call failed?

Comment on lines +165 to +168
ActorShutdownReason::DbError(account_id) => {
tracing::error!(account_id = %account_id, "Account actor shut down due to DB error");
Ok(())
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea not to update the registry and remove the actor handle when this happens? Took a quick look at the other branches and could nto find it either

Comment on lines +140 to 144
pub fn broadcast(&self) {
for handle in self.actor_registry.values() {
handle.notify.notify_one();
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this scale well? Feels like broadcasting to all actors will cost a lot of unnecessary queries. For instance, I saw that select_candidate_from_db queries account state and notes through 2 different queries in the DB. The account state one is probably not trivial though the other one might be.
I know this is what the PR was partially trying to change but wonder if sending the data here to filter in memory is worth it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I believe there is an low-hanging optimization opportunity with the 2 queries in the DB. You can check for notes first and bail if none are found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no changelog This PR does not require an entry in the `CHANGELOG.md` file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants